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SUMMARY  PAGE 


PROBLEM 

Sometimes  it  is  necessary  to  measure  some  aspect  of  human  performance 
capability  repeatedly.  For  example,  repeated  measurements  are  necessary 
to  track  the  time-course  of  onset  or  recovery  from  the  effects  of  stressors 
such  as  drugs,  diseases,  and  exposure  to  toxic  materials  or  unusual  environ¬ 
ments  (e.g.,  vibration,  high  altitude,  undersea  compression,  heat,  cold, 
etc.).  These  repeated  measurements  usually  cannot  be  made  with  the  same 
human  performance  test  because  the  subjects  are  influenced  by  their  previous 
responses.  The  problem  is  to  make  different  but  equivalent  test  forms  for 
each  occasion  of  measurement. 


FINDINGS 

Computers  can  be  programmed  to  sample  alternate  forms  of  a  test  from 
the  oopulation  of  all  possible  test  items  of  the  type  in  the  test.  These 
test  -  generation  methods  have  proven  to  be  useful,  economical,  and  rapid. 
Eight  illustrative  tests  are  provided. 


RECOMMENDATIONS 

Use  sampling  techniques  to  generate  alternate  forms  of  human  perform¬ 
ance  tests.  The  sampling  techniques  can  be  implemented  on  a  computer  for 
printing  of  multiple  copies  of  multiple  forms  of  tests. 


The  work  was  funded  by  the  Naval  Medical  Research  and  Development  Command 
and  by  the  Biological  Sciences  Division  of  the  Office  of  Naval  Research. 


ii 


INTRODUCTION 


Human  performance  tests  are  samples  of  behavior.  They  are  used  to 
infer  capabilities  which  may  he  reflected  in  many  other  types  of  behavior, 
including  occupational  performance.  Sometimes  it  is  necessary  to  sample  the 
same  subjects  repeatedly,  for  example,  to  monitor  recovery  from  an  affliction 
(e.g.,  Bell,  Jurek,  &  Wilson,  1976),  to  assess  the  effect  of  an  environmental 
stressor  (e.g..  Carter,  1979),  or  to  evaluate  the  effectiveness  of  training 
(Goldstein,  1974).  In  these  repeated-measures  applications,  different  and 
equivalent  forms  of  tests  must  be  used  for  each  occasion  of  measurement 
because  a  subject  might  recall  the  answers  if  the  same  form  of  the  test  were 
reused . 

Equivalence  of  numerous  "alternate  forms"  of  some  tests  has  been  demon¬ 
strated  empirically.  The  Wondelic  Test  of  general  mental  ability  can  be 
bought  in  14  forms.  Moran,  Kimble,  and  Mefferd  (1964)  have  published  20 
alternate  forms  of  tests  of  five  specific  mental  abilities,  and  Horne 
(1972)  has  similarly  developed  alternate  forms  for  repeated  measurements. 
Empirical  verification  of  the  equivalence  of  such  tests  Is  a  tedious  and 
controversial  procedure  involving  examination  of  test  characteristics  like 
the  number  of  items,  item  difficulties,  item  variances,  item  covariances, 
and  Item-criterion  covariances  (Horst,  1968).  A  much  easier,  and  theo¬ 
retically  appealing  approach  to  alternate  forms  of  tests  can  be  based  on 
sampling  techniques  (Cochran,  1977).  Alternate  forms  of  tests  will  be 
created  by  sampling  randomly  from  a  homogeneous  population  of  test  items. 

This  report  will  describe  the  application  of  sampling  techniques  implemented 
on  a  computer  to  generate  alternate  printed  forms  of  performance  tests. 

Use  of  the  computer  enables  one  to  create  any  number  of  copies  of  any  number 
of  forms  of  a  test. 

Relevant  Sampling  Concepts  (Cochran,  1977) 

In  this  report,  alternate  forms  of  tests  are  considered  to  be  samples  of 
test  Items.  A  sample  is  a  part  of  an  aggregate.  The  most  fundamental 
concept  in  sampling  theory  is  the  population.  The  population  is  the  aggre¬ 
gate  from  which  a  sam'ple  is  chosen.  For  example,  in  sampling  arithmetic 
items  the  population  could  be  all  possible  arithmetic  items,  all  addition 
problems,  or  all  problems  involving  addition  of  three  positive  two-digit 
integers  arrayed  vertically.  It  is  important  to  define  the  population 
precisely,  so  that  the  sample  will  include  only  items  that  are  relevant  to 
the  purpose  of  the  sample. 

Error  of  measurement  is  another  important  concept  in  sampling  theory. 
Errors  of  measurement  occur  when  what  is  measured  is  different  from  what  was 
intended  to  be  measured.  If  the  population  is  not  precisely  defined,  the 
sample  may  include  items  that  would  not  have  been  intended.  For  example,  if 
any  addition  problem  involving  one  or  two-digit  numbers  is  allowed,  then  zero 
may  occasionally  be  sampled  as  one  of  the  numbers  and  the  problem  will  not 
really  require  addition  skill. 

The  most  useful  concept  of  sampling  theory  is  the  random  sample.  If  a 
sarap^  of  n  items  from  among  N  in  the  population  is  random,  then  each  of  the 

JN*  T  T  VH~~  possible  samples  of  n  from  among  N  has  an  equal  chance  of  being 
elected,  (N!  *  N  .  (N-l).(N-2) . 1),  If  the  n  items  are  drawn  one 
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at  a  time  from  among  the  N  original  items  in  the  population,  then  a  random 
sample  gives  an  equal  chance  of  selection  to  any  item  in  the  population  not 
already  drawn.  A  random  sample  has  two  properties  of  major  importance: 
representativeness,  and  analytic  variances  of  sample  statistics.  The 
representativeness  of  a  random  sample  is  obvious;  every  segment  of  the  popu¬ 
lation  has  as  much  chance  of  being  included  as  another  comparable  segment. 
Random  samples  also  enable  one  to  calculate  the  variance  of  the  sample 
statistics  (like  the  mean,  total,  or  proportion)  without  making  repetitive 
samples.  Random  samples  enable  one  to  calculate  variances  from  analytic 
expressions  rather  than  from  empirical  exercises  requiring  vastly  more 
resources.  Random  sampling  of  test  items  requires  that  each  possible  test 
item  has  an  equal  chance  of  selection.  For  example,  there  are  90  two-digit 
numbers,  so  there  are  90.90  *  8100  possible  addition  problems  involving  two 
two-digit  numbers;  in  a  random  sample  of  one  item,  each  problem  has  one  chance 
in  8100  of  being  selected. 

The  possibility  of  expressing  analytically  the  variance  of  sample  statis¬ 
tics  enables  us  to  identify  the  conditions  leading  to  the  most  precise  (mini¬ 
mum  variance)  samples.  A  result  that  is  useful  in  test  construction  tells  us 
how  to  allocate  test  effort  (time  or  number  of  items)  among  various  types  of 
test  items  which  measure  the  same  thing,  assuming  that  the  items  are  randomly 
sampled  within  each  type.  The  optimum  allocation  will  produce  a  great  increase 
in  precision  if  the  types  of  test  items  produce  far  different  means  and  vari¬ 
ances  of  performance.  For  example,  in  Neisser's  letter  search  task  (Neisser, 
Novick,  &  Lazar,  1964),  the  mean  and  variance  increase  with  the  number  of 
targets  for  which  the  subject  is  scanning.  An  optimum  allocation  of  sampled 
test  items  is  proportional  to  the  standard  deviation  of  test  scores  for  each 
type  of  item,  and  is  inversely  proportional  to  the  square  root  of  the  cost  of 
testing  (or  test  time)  for  that  type  of  item,  assuming  that  cost  increases 
linearly  with  the  number  of  items  (Cochran,  1977,  pp.  98).  A  practical  rule 
is  sample  more  of  a  particular  type  of  item  if  performance  on  that  type  is 
more  variable,  or  sampling  on  that  type  is  cheaper.  The  score  for  a  test 
made  up  of  parts  would  be  the  weighted  average  of  the  scores  for  each  part; 
the  weight  for  each  part  is  the  number  of  items  in  that  .part. 

These  sampling  techniques  (defining  the  population,  reducing  errors  of 
measurement  by  excluding  peculiar  items,  random  sampling  of  test  items,  and 
optimum  allocation  of  test  effort  to  similar  types  of  items  with  differing 
mean  performance  or  dispersion)  are  employed  in  the  computer  programs 
described  in  the  remainder  of  this  report. 

The  Tests 


The  tests  were  generated  on  paper  by  computerized  sampling  of  items 
and  were  modeled  after  tests  that  have  been  reported  as  useful  in  the  liter¬ 
ature  of  performance  testing.  The  tests  of  arithmetic  computation  (Number 
Facility)  were  like  those  described  by  Ekstrom,  French,  Harman,  and  Dermen 
(1976).  The  number  comparison  test  (Perceptual  Speed)  was  also  taken  from 
Ekstrom,  French,  Harman,  and  Dermen.  The  Code  Substitution  test  was  like 
that  in  the  Weschler  Intelligence  Scale  (1958).  The  Grammatical  Reasoning 
test  was  from  Baddeley  (1968).  The  Pattern  Recognition  test  was  similar  to 
that  used  by  Alluisi  and  Thurmond  (1970),  based  on  "metric  figures"  or  hist- 
oforms  invented  by  P.  M.  Fitts  (Fitts,  Weinstein,  Rappaport,  Anderson,  & 
Leonard,  1956).  The  pattern  comparison  test,  which  is  procedurally  similar 
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to  the  number  comparison  test,  was  reported  by  Klein  and  Armitage  (1979). 

The  letter  search  test  is  an  adaptation  by  Rose  (1974)  of  Neisser's  experi¬ 
ment  (Neisser,  Novick,  &  Lazar,  1963).  Finally,  the  number  search  test  was 
similar  to  an  experimental  psychology  task  used  by  Green  and  Anderson  (1956), 
and  others.  It  requires  a  subject  to  find  a  target  number  located  among 
other  numbers  randomly  dispersed  on  a  page  of  computer  paper.  The  subject's 
task,  the  method  of  sampling  items,  and  some  representative  items  for  each 
test  will  be  discussed  in  the  next  section. 

Test  Procedures  and  Items 


Addition  Test 


The  addition  test  requires  the  subject  to  perform  and  record  addition  of 
three  two-digit  numbers,  arranged  vertically.  Some  representative  items  are: 


34 

14 

91 

46 

32 

85 

27 

61 

+73 

+41 

+26 

+35 

The  test  is  conducted  in  two  parts,  each  lasting  two  minutes,  with  a  brief 
rest  between  parts.  Each  part  includes  4  lines  of  12  items  presented  on  a 
single  page.  The  preferred  score  is  the  total  number  of  correct  items. 

Data  obtained  with  this  test  were  reported  by  Bittner  and  Carter  (1981). 

A  block  diagram  showing  the  construction  of  a  single  item  and  the  Fortran 
IV  computer  program  for  this  test  are  given  in  Appendix  A. 

Other  computational  tests  involving  addition  of  three-digit  numbers 
arranged  horizontally,  division,  subtraction,  and  multiplication  have  also 
been  programmed  for  item  sampling.  The  arithmetic  tests  produce  highly 
correlated  scores,  so  only  the  simplest  arithmetic  test,  addition,  was 
presented  here. 

Number  Comparison 

The  number  comparison  test  requires  the  subject  to  compare  two  adjacent 
strings  of  three  to  nine  digits.  The  strings  will  be  the  same,  or  (with  pro¬ 
bability  .5),  they  will  differ  in  one  of  the  digits  (chosen  at  random).  The 
subject  is  to  write  "S”  on  a  line  between  the  strings  if  they  are  the  same,  or 
"D"  if  they  are  different.  Some  representative  items  are: 

930  _  930  63983496  _  63903496 

The  test  is  conducted  in  one  part  lasting  three  minutes.  There  are  14  lines 
of  3  items  on  each  page,  and  the  computer  prints  5  pages  of  items.  The  pre¬ 
ferred  score  is  the  number  of  correct  responses  minus  the  number  of  Incorrect 
responses.  Data  obtained  with  this  test  were  reported  by  Bittner  and  Carter 
(1981)  and  a  block  diagram  showing  construction  of  a  single  item  is  presented 
in  Appendix  B.  The  Fortran  IV  computer  program  for  this  test  is  in  Appendix  B. 

Code  Substitution 


The  code  substitution  test  requires  the  subject  to  refer  repeatedly  to 
a  table  of  nine  digit-letter  pairs  to  find  the  digits  which  correspond  to 
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randomly  selected  letters.  The  letters  of  the  nine  code  pairs  associated 
with  each  form  of  the  test  are  chosen,  at  random,  from  among  the  26  letters 
of  the  alphabet.  Given  a  letter,  the  subject  responds  by  writing  the  cor¬ 
responding  digit  under  the  letter-  Some  representative  items  are: 


CODE 

Q 

E 

H 

G 

u 

P 

j 

M  C 

DIGIT 

(2) 

(1) 

(3) 

(8) 

(4) 

(9) 

(6) 

(5)  (7) 

M 

J 

H 

M 

Q 

H 

(  ) 

(  ) 

(  ) 

(  ) 

(  ) 

(  ) 

The 

test  is 

conducted 

in 

one  part 

lasting 

three  minutes. 

The  preferred 

score  is 

the  number  of  correct 

responses.  Data 

obtained 

with 

this  test  were 

reported  by  Pepper,  Kennedy,  Bittner,  and  Wiker  (1980).  A  block  diagram 
showing  how  to  construct  a  single  item  and  the  Fortran  IV  computer  program 
for  this  test  are  given  in  Appendix  C. 

Grammatical  Reasoning 


The  grammatical  reasoning  test  requires  the  subject  to  read  and  com¬ 
prehend  a  simple  statement  about  the  order  of  two  letters,  A  and  B.  Then 
the  subject  observes  AB  or  BA  printed  next  to  the  statment .  If  the  state¬ 
ment  correctly  describes  the  order  of  the  letters,  then  the  subject  marks 
"T" .  Otherwide,  the  subject  marks  The  statements  about  the  order  of 

the  letters  use  the  verbs  "preceeds",  or  "follows**,  the  active  or  passive 
voice,  and  negative  or  affirmative  phrasing.  In  addition ,  the  letters  can 
be  in  either  order,  and  the  statement  can  be  true  or  false.  These  five 
dichotomies  lead  to  32  possible  test  items.  Representative  items  are: 

A  preceeds  B  AB  T  F 

B  is  not  followed  by  A  BA  T  F 

All  items  are  used  In  each  form  of  the  test;  the  forms  differ  the 
order  of  the  items.  There  are  32!  possible  forms.  (32!  *  2.6  x  10  ).  Forms 

are  constructed  by  chosing  the  order  of  the  items  at  random.  The  test  lasts 
one  minute.  The  preferred  score  is  the  number  correct  minus  the  number  of 
incorrect  responses.  Data  obtained  with  this  test  were  reported  by  Carter, 
Kennedy,  and  Bittner,  (1981).  Appendix  D  gives  the  Fortran  IV  computer 
program  which  generates  this  test* 

Pattern  Recognition 

The  pattern  recognition  test  requires  the  subject  to  look  at  a  histogram 
pattern,  and  then  recognize  it  among  nine  other  patterns  arrayed  In  a  row  to 
the  right  of  it.  The  subject  underlines  the  pattern  tfolch  matched  the  target 
pattern  at  the  left  end  of  the  row.  'A  representative  item  is: 

x  x 

X  X 
XXX 
X  XXX 
XXXXX 
XXXXXX 


X  X  ;  X  X  XX 

XX  xxxxxxx 

XXX  X  XX  XXX  xxxxx 

X  XXX  XXXXX  XXXX  XXX  X  XXX  X  XXX 

X  XXX  XXXXX  XXXXX  X  XX  X  XXXXX  XXXXXX 

XXXXXX  -rXXXXXX  XXXXXX  XXXXXX  XXXXXX  XXXXXX 


X 

XX  X 

X  X  XX  XXX 

XXX  X  XX  xxxxx 

X  XXX  X  XXX  xxxxx 

XXXXXX  XXXXXX  XXXXXX 
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The  test  is  conducted  in  two  parts,  each  lasting  90  seconds.  Each  part  con¬ 
sists  of  five  pages  of  five  items.  The  preferred  score  is  the  number  of 
correct  responses.  Data  obtained  with  this  test  were  reported  by  Shannon 
and  Carter  (1981)  and  a  block  diagram  showing  construction  of  a  single  item 
is  presented  in  Appendix  E.  The  Fortran  IV  computer  program  which  generates 
this  test  is  in  Appendix  E. 


Pattern  Comparison  Test 


The  pattern  comparison  test  is  procedural ly  similar  to  the  number 
comparison  test.  The  pattern  comparison  test  requires  the  subject  to 
compare  two  adjacent  patterns  of  astrisks.  The  subject  is  to  write  r,S,f 
on  a  line  between  them  if  they  are  the  same,  or  "D"  if  they  are  different. 
Some  representative  items  are: 


* 


* 

* 

**  * 

it 

it 

** 

** 

* 

** 

* 

*  * 

it 

** 

* 

* 

* 

* 

- * 

it 

* 

—  * 

* 

***  *  ***  * 


* 

* 


* 


The  test  is  conducted  in  one  part  lasting  two  minutes.  There  are  six  lines 
of  three  items  on  each  of  eight  pages  of  the  test.  The  preferred  score  is 
the  number  of  correct  responses  minus  the  number  of  incorrect  responses. 

Data  obtained  with  this  test  were  described  by  Shannon  and  Carter  (1981) 
and  a  block  diagram  showing  construction  of  a  single  item  is  presented  in 
Appendix  F.  The  Fortran  IV  computer  program  for  this  test  is  in  Appendix 
F.  An  improved  version  of  the  test,  in  which  the  patterns  may  differ  only 
in  the  placement  of  a  single  asterisk,  has  been  programmed  in  Basic  language. 

Letter  Search 


The  letter  search  test  has  two  parts.  In  the  first  part  the  subject 
is  required  to  look  for  a  particular  target  letter  or  number  in  an  array 
with  many  rows  and  five  columns  of  numbers  or- letters.  A  mark  is  to  be 
made  next  to  any  row  having  the  target  letter  in  it.  In  the  second  part 
of  the  test  there  are  four  target  letters  or  numbers.  In  this  part  of  the 
test  a  mark  is  to  be  made  next  to  any  row  having  any  of  the  four  target 
letters  or  numbers  in  it.  Some  representative  items  are: 


Part  1 


Part  2 


Target:  G 


Targets:  M  G  T  X 


GLNR7 
T  M  T  R  L 
G  L  7  H  N 
M  7  H  7  G 


KTNL7 
KLFRH 
N  F  R  K  H 
M  N  7  F  R 


Subjects  are  allowed  90  seconds  for  part  1  and  3  minutes  for  part  2.  The 
test  times  for  the  two  parts  approximate  an  optimum  sampling  allocation 
because  the  standard  deviation  of  performance  in  part  2  is  double  that  for 
part  1.  The  preferred  score  is  the  time  per  correct  response.  Data  obtained 
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with  this  test  were  described  by  Shannon  and  Carter  (1981)  and  a  block  diagram 
showing  construction  of  a  single  item  is  presented  in  Appendix  G.  The  Fortran 
IV  computer  program  for  this  test  is  in  Appendix  G. 

Number  Search  Test 

The  number  search  test  has  four  parts.  All  four  parts  require  the  sub¬ 
ject  to  look  for  a  target  number  among  other  numbers  scattered  at  random 
locations  on  a  page  (the  search  page).  The  specifications  for  each  part  are: 


Number  of  Targets 


Number  of  Numbers  on  the  Search  Page 


10 

10 

40 

40 


The  target  or  four  possible  targets  are  printed  at  the  top  of  the  page  pre¬ 
ceding  the  search  page.  Only  one  target  number  will  appear  on  the  search 
page,  and  it  appears  only  once,  in  a  randomly  chosen  location.  The  subject 
is  required  to  find  and  mark  the  target  on  the  search  page.  A  representative 
item,  with  targets  1,5, 2, 9  is  presented  on  the  following  page.  The  test  has 
6,  12,  8  and  16  items  in  parts  1  through  4,  respectively,  to  allocate  test 
resources  in  a  near  optimum  way,  considering  the  variance  of  performance  in 
each  part. 

The  preferred  score  is  time  per  correct  response.  Time  to  complete  all 
items  is  recorded  for  each  part.  Data  obtained  with  this  test  were  described 
by  Shannon  and  Carter  (1981)  and  a  block  diagram  showing  construction  of  a 
single  item  is  presented  in  Appendix  H.  The  Fortran  IV  computer  program  for 
this  test  is  in  Appendix  H. 


Eight  performance  tests  for  repeated  measurements  are  presented,  along 
with  computer  programs  to  generate  the  tests.  The  computer  programs  can  be 
used  to  sample  equivalent  forms  of  the  tests  for  any  number  of  occasions  of 
repeated  measurement.  The  programs  also  print  any  specified  number  of  copies 
of  the  alternate  forms  to  provide  for  multiple  subjects.  The  logic  of  the 
item-sampling  procedures  and  block  diagrams  of  the  item-generation  method 
were  discussed. 
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Appendix  B-2,  Number  Comparison 


Number  Comparison  Test  Block  Diagram  for  a  Single  Item 


horizontal  sequence 
^  *  1  to  String  Length, 
leave  space  for  response, 
and  repeat 
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Code  Substitution  Test  Block  Diagram  for  a  Single  Item 
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APPENDIX  E 
PATTERN  RECOGNITION 
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Appendix  E-2,  Pattern  Recognition 


Pattern  Recognition  Test  Block  Diagram  for  a  Single  Item 


Loop  until 
nine  different 
patterns  have 
been  made 
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PATTERN  COMPARISON  (NUMSER) 
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Letter  Search  Test  Block  Diagram  for  a  Single  Item 

Lookup  Table 
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Appendix  H-2,  Number  Search 


Number  Search  Test  Block  Diagram  for  a  Single  Item 


X^  -  The  number  of  potential  target  numbers 
*  The  number  of  background  numbers 


Chose  X^  +1  Random  Integers  (1,36)  for  line 
placement  of  the  X^  background  numbers  and  the 
target 


Chose  X^  +1  Random  Integers  (1,120)  for  column 
placement  (within  assigned  lines)  of  X^  background 
numbers  and  the  target 


Chose  Xj  different  Random  Integers  (0,9)  to  be 
potential  targets 


Print  potential  targets 


i 


Choose  X^  Random  Integers  (0,9)  to  be 


background  numbers 


Choose  one  of  the  potential  targets 
at  Random  (l,Xj) 


Print  X^  background  numbers  and  a  single 
target  at  random  locations  specified  by 


chosen  column  and  row  numbers 
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